The objective of this report is to give an detailed description of individual steps of the data analysis
The compilation of a pollen dataset for our analysis is performed a priori. Raw pollen datasets are carefully selected with the R-Fossilpol package, and the guidelines to the workflow are well described in Flantua et al. 2023 and in our website Fossilpol project. Most datasets are obtained from the Neotoma Paleoecology Database. Some additional data are from private owners in areas with data gaps, which we have limited access to use. We do not have the intellectual property rights to make these data public available. Therefore, only the derivatives of the analysis can be publicly shared. Table 1 provides a summary of the settings used in the FOSSILPOL workflow to obtain a standardised project dataset. This input data are further filter during the data processing steps in HOPE to get the final collection of a standardized dataset of high data quality we can use further in our data analyses.
An important step in FOSSILPOL to obtain a standardised pollen data
set within and across regions is the harmonisation of pollen types.
Different analysts have different backgrounds and schools using
different nomenclature, and the level of pollen taxonomic
identifications and names can vary widely. To be able to make numerical
comparisons of different pollen records, the level of pollen taxonomy
should be similar. Consequently, pollen harmonisation tables have been
produced for different regions to try to minimise biases related to
this. The regional harmonisation tables created in our project are for
Europe, Levant, Siberia, Southern Asia, Northern America, Latin America,
and the Indo-Pacific region (Birks et al. harmonisation paper). These
tables are used as input in the Fossilpol workflow above (see
Fossilpol step_by_step guide).
We have applied a number of filtering criteria to obtain as high a data quality as possible so that we can compare the numerical estimates on standardised data sets. These filtering criteria are: remove potentially duplicated pollen records, sorting levels (samples) by age, remove levels (samples) lower than a threshold of total number pollen grains counted (= pollen sum), remove pollen records based on age (minimum and maximum age ranges), remove levels (samples) depending on the age of the last control point, remove samples beyond the age ranges of interest, and remove pollen records if the total number of samples (N) is too low.
This filtering is done on the chronologies, raw pollen counts, harmonised pollen counts, and the age uncertainties from the age-depth models (Bchron). The preferable number of minimum pollen grains is set to 150, but this led to a great loss of datasets in regions with less data coverage, and we therefore reduced this number to 25 with the condition that less than 50 % of the samples must have a low pollen sum. This allow us to keep more datasets, but in the cases pollen records have a low minimum pollen sum, we acknowledge that the estimates of pollen assemblage properties (PAPs) are less robust. The maximum age beyond extrapolation is set to 3000 years because ages extrapolated beyond this threshold is considered highly uncertain. Finally, pollen records with less than 5 samples are removed for further analyses.
Let’s select a one record and use it as an example.
To determine the impact of past humans on fundamental ecosystem properties, we need to develop indicators of past human presence and activity. This led to the development of a new method, where we use human event detection and indicators identified from pollen records based on expert knowledge, combined with the method for quantifying human presence based on radiocarbon dates derived from archaeological artifacts and Summed Probability Densities (SPD) (Bird et al. 2022). We believe that this solves the issue that we can use a standardised variable as indicator of past human impact, and partially avoids the difficulty of creating standardised variables to detect human disturbance events across different regions and continents. This may reduce the potential circularity of human detection events derived from the same pollen records as the estimates of ecosystem properties.
For each pollen record, we have detected periods of human presence from the pollen data. Two methods have been used:
First, a pollen diagram of each pollen record has been examined by a regional expert and the age of each event type has been recorded.
#>
#> ------------------------------------------------
#> Region Event.type
#> --------------- --------------------------------
#> North America BI = Before Impact; FC = First
#> Cultivation; ES = European
#> Settlement
#>
#> Europe BI = Before Impact; FI = First
#> Indication; FCu = First
#> Cultivation; EC = Extensive
#> Clearance; CC = Complete
#> Clearance
#>
#> Asia BI = Before Impact; FI = First
#> Indication; FCu = First
#> Cultivation; EI = Extensive
#> Impact
#>
#> Indopasific no_impact = No Impact; weak =
#> Weak Impact; medium = Medium
#> Impact; strong = Strong Impact
#> ------------------------------------------------
#>
#> Table: Table 2: Type of human events identified in pollen diagrams
Note that the event types are uniquely defined within continents, and event types with the same name have different meanings between continents.
Second, an algorithm is made to obtain the binary variables (0/1) associated with each event type that is identified in each pollen record. A new vector with the average ages in between levels (samples) of the identified event type was created because the time of the events is assumed to have occurred prior to the changed event. A new matrix was created that uses the new age vector with the events type. The different event types are assigned binary values (0/ 1) depending on if the event type is present (i.e. when the age of the specific event is detected). Ultimately, logical principles were followed and the binary values were adjusted to get the event data for each pollen record. If no human event is recognised, it does not necessarily mean that humans were absent, but instead that there was not enough information to identify human activity in the pollen records.
Below is a figure that intends to represent the data associated with human events in example record. The different colours represent the differnet types of human events in each region. The smooth trend lines represent simple binomial GAM models that show the main trends and alteration in the timing of events over time.
The quantification of summed probability distribution (SPD). requires a distance to be selected around each site location to collect the relevant dates of archaeological artefacts around it. This will limit the area of human presence and indirectly the amount of human activity relevant to pollen records from each site.
We used the global dataset of radicarbon dates (RC dates) of archeological artefacts from Bird et al. 2022.
We have gathered all radiocarbon dates up to 500 km away and split them into groups (categories) by the distance to the sequence.
Only RC dates with valid geographical location (longitude and
latitude), and ‘LocAccuracy’ > 0 were selected. For each pollen
record, RC dates were classified by the geographical distance to the
pollen record. The chosen distance classes were: 5, 25, 50, 100, 250,
500 km. For each distance category, we used all RC dates up to the
maximum distance of selected category. For example, 250
contains all RC dates up to 250 km including 5,
25,…
However, distance class with less than (250 RC dates) (a Threshold) is filtered out in order to maintain robust SPD estimation.
For each distance category, SPD is estimated using spd
function from rcarbon package for each distance class for
each year between a minimum threshold age and 12 ka.
Radiocarbon dates were calibrated using calibrate
function from rcarbon package with appropriate calibration
curves (“IntCal20”, “ShCal20”, “mixed”). Calibration curves were
obtained rcarbon package and “mixed” was created using
rcarbon::mixCurves function with ‘p’ = 0.5. Calibration
curves were assigned by their geographical location following Hua et
al., 2013.
For the selected seqeunces intcal20 have been used.
The temporal distribution of SPDs constructed by diferent distances:
The minimum threshold ages are different for different regions and are decided based on the availability of radiocarbon dating for different regions. In general there is a bias that radiocarbon dating is rather limited on younger material where in many regions there are a lack of C14 data during the last 2000 years. Table 5 show the ages where data younger than age_from where removed.
#>
#> --------------------------
#> region age_from
#> --------------- ----------
#> Europe 2000
#>
#> Latin America 2000
#>
#> Asia 2000
#>
#> North America 500
#>
#> Oceania 500
#> --------------------------
In order to select the best distance, we want to know, which SPD curve is the best at explaining the changes in pollen events.
Therefore, for each distance, we calculated Redundancy Analysis (RDA) with individual event values as response and SPD as predictors.
Here is a visualisation of the data to fit:
Or in a form of table:
#>
#> ----------------------------------------------------------------------------
#> age no impact first impact extensive clearince complete clearince
#> ------ ----------- -------------- --------------------- --------------------
#> 3790 0 1 0 0
#>
#> 6767 1 0 0 0
#>
#> 7015 1 0 0 0
#>
#> 7262 1 0 0 0
#>
#> 7869 1 0 0 0
#>
#> 8236 1 0 0 0
#> ----------------------------------------------------------------------------
#>
#> Table: Table continues below
#>
#>
#> ----------------------------------------------------------
#> first cultivation 50 100 250 500
#> ------------------- --------- --------- --------- --------
#> 0 0.02323 0.06117 0.2413 0.9766
#>
#> 0 0.01443 0.0655 0.3516 1.124
#>
#> 0 0.01545 0.05377 0.2674 0.9394
#>
#> 0 0.01738 0.04415 0.2374 0.673
#>
#> 0 0.01178 0.02182 0.1267 0.2665
#>
#> 0 0.00652 0.01761 0.08569 0.2014
#> ----------------------------------------------------------
We can plot the individual ordianation but it is not very informative. The size and ocupancy of points corespond with their age.
We can evaluate the amount of variability explained by each ordination. We then extracted adj.r2 for each SPD distnce:
Therefore we have selected 500 as the best distance and the prefered SPD curve for 4197 sequence
Here is a overview of the status of records base on the presence of humna impact from pollen diagrams and RC data
Now we can apply the selection of best distance to all sequences which fullfil such criteria: 1. Have events 2. Have at least one SPD constructed
We can summary the best distance per each continent
To prepare the response variables of our main pollen dataset compilation and to be able to analyse fundamental ecosystem properties, we prepared the standard estimates of pollen assemblage properties (PAP) (Bhatta et al. 2023). The PAP estimations provide different aspects of pollen assemblage diversity which includes palynological richness, diversity and evenness, compositional change and turnover, and Rate-of-Change (RoC).
These response variables are calculated using the newly developed R-Ecopol
package that contain all the functions needed to estimate PAPs for
our pollen data assembly. The base functions used in this package are
derived from other dependency packages such as mvpart
package (Therneau et al. 2014) to estimate pollen zonations with
multivariate regression trees, vegan (Oksanen et al. 2022)
for other mutivariate techniques and dissimilarity indices functions
from iNext (Chao et al. 2014) that have been modified to
extract interpolated Hill numbers based on a minimum sample size, and
newly developed R functions to run DCCA using Canoco 4.5
(ter Braak xxxx) to list a few, among other, dependency packages.
Additionally, R-Ratepol (Mottl 2021) is used get the
estimates of RoC
The different aspects of palynological diversity are estimated using Hill’s effective species numbers N0, N1, N2, and the associated evenness ratios of N2/N1 and N1/N0. These are combined through one equation where the effective species numbers differ mainly in how the rare taxa are weighted in the parameter q:
\[^q{D} = (\sum_{i=1}^{S} p_{i}^{q})^{1/(1-q)}\]
When q is 0, rare and abundant taxa have equal weight and the number is simply the number of taxa in the sample. The equation is not possible to define for q = 1, but as it approaches 1, it is equal to the exponential of the well-known Shannon index and reports the number of equally common taxa. When q = 2, it is the same as the inverse Simpson diversity index and provides the number of equally abundant taxa with a low weight on rare taxa. The advantage of using effective species numbers is that they provide easily interpretable units and contain the doubling effect. To standardize the sample sizes, we use the rarefaction approach developed by Chao et al. These estimates are rarefied to the number of n = 150 grains, or in some cases to a lower sum (minimum n = 25). Some pollen records were only available as pollen percentages, and as the sample size is unknown, these are then rarefied to the minimum sum of percentages. The evenness ratios will be 1 if all taxa are equally abundant, and the ratios hence indicate changes in abundances between the numbers of rare, equally common, and abundant taxa.
We acknowledge that even though attempts are made to standardise richness and diversity estimates based on standard sample size, there are additional biases that are not taken into consideration such as differences in total pollen production and pollen representation (Odgaard 1998, 2001). In some cases, the total pollen sum is also too low to be considered a robust estimate, but it was a choice made on balancing loosing too much information from geographical areas with less data coverage (see data filtering above).
Compositional change is calculated using multivariate regression trees (MRT) with age as the constraining variable. MRT is in general a robust tool to explore and predict changes in multivariate data sets using environmental predictor variables (De’ath, Simpson and Birks 2012). This technique has been adopted in palaeoecology to detect major zones in pollen diagrams or shifts between periods of homogeneous vegetation in time (Simpson and Birks 2012). We use the pollen taxa in percentages without any data transformations as the response and the median ages derived from the age-depth model as the constraining variable. The recursive partitioning are based on chi-square distances between pollen samples constrained by time. The number of cross-validation is set to 1000, and the optimal sized tree is chosen based on the 1SD rule (Simpson and Birks 2012).
Compositional turnover is estimated using detrended canonical correspondence analysis (DCCA) with age as the explanatory variable (ter Braak and Smilauer 2007?). Changes in Weighted average (WA) sample scores (CaseR scores sensu ter Braak and Smilauer 2012) are measures of compositional turnover in standard deviation (SD) units (Birks 2007). The WA scores are regressed with time using a second-order polynomial (age+age^2) to allow more flexibility in the turnover pattern within a pollen record. Total compositional turnover is a measure of the total length of CaseR scores along the DCCA axis 1, whereas the pattern within a record is the measures between the individual samples along the DCCA axis 1. The response data are pollen percentages without any transformation to maintain the chi-square distances between samples, whereas the ages are the median ages derived from the age-depth model for each site.
Rate-of-change for the pollen assemblages in the pollen records are quantified using the novel R-Ratepol package (Mottl et al. 2021). RoC is estimated using moving windows of 500 years’ time bins of five number of windows shifts where samples are randomly selected for each bin. This approach is shown to increase the correct detection of RoC peak-points than the more traditional approaches (Mottle et al. 2021). RoC are reported as dissimilarity per 500 years using the Chord dissimilarity coefficient. Sample size is standardized in each working unit to 150 grains or the lowest number detected in each dataset. We use only the RoC scores further in the analyses.
Change-points detection of all the PAP variables are calculated using conventional regression trees (RT) for single variables with Euclidean distances. The transitions between the resulting groups (or zones) per variable is detected and saved as binary (0/1) variables. A change-point is defined as 1, where the mean ages between the two consecutive samples are used as the timing of this significant change. This is done individually for each PAP variable.
The significant change points of the richness, diversity, and evenness variables are combined into one variable, and the significant change points of compositional turnover, compositional change, and rate-of-change is combined in a second variable. The density of these two variables are calculated using a Gaussian kernel,and re-scaled to each of specific age ranges for each individual pollen record (i.e. minimum and maximum ages). To solve the boundary issue in density estimation the data is reflected to 0. We extract the interpolated values at every 500 years time step.
All response variables have been estimated using the harmonised pollen records for each location. To obtain estimates of equal spacing of 500 years, we used linear interpolation. In the context of the temporal analysis we analyse samples distributed in space across time, and equal time steps are necessary.
In order to choose a method of interpolation to obtain data on equal time steps, we compared generalise additive models (GAM), hierarchical generalised additive models (HGAM), and simple linear interpolation. By applying linear interpolation, we found that the correlation structure between the multivariate response variables are more similar to the original estimates without equal spacing than applying a GAM or HGAM. The GAM or HGAM models sometime showed unexpected patterns in single PAP estimate that changed these correlations. Since we cannot individually assess all single models for each of the variable in all of the records (>1000), we choose the simplest linear interpolation method. Similar issues were detected when estimating the density variables of changes points. The first approach was to estimate densities of the individual change points, and then use hierarchical generalised additive models (HGAM) to find the common pattern between the two groups representing significant changes in richness, diversity, and evenness, or change in pollen assemblages (MRT, RoC, DCCA1). As some of the models did not converge and showed inconsistent patterns we use the density estimates on the combined variables directly and extracted the interpolated values at every 500 years step.
Paleoclimate from the CHELSA-TraCE21k downscaling algorithm is
downloaded from the CHELSA database (Karger et al. 2021, Karger et
al. 2021). The selected bioclimatic variables are annual mean
temperatures ℃ (bio1, temp_annual), minimum temperatures of
coldest month ℃ (bio6, temp_cold), precipitation of warmest
quarter kg m-2 quarter-1 (bio18, prec_summer) and
precipitation of coldest quarter kg m-2 quarter-1 (bio19,
prec_win), where we extracted climate values for the
coordinates for each dataset_id retrieving the full time series of every
100 years. In addition, we downloaded the monthly climatology for daily
maximum near-surface temperature K/10 (tasmin).
The whole project is relative complex, therefore we included a conceptual workflow figure (Figure 1). Note: Figure is work in progress and will be improved with graphic designer.
To test if the ecological processes have changed due to past human
activity in single records, we use reduced rank multivariate regression.
This is also known as distance-based redundancy analysis (db-RDA). We
used the R package rdacca.hp to run hierarchial variation
partitioning with several predictors. This estimates the variation per
variables in different combinations to get the average variable
importance independent of the order of predictors. db-RDA was performed
using Gower-distances adding a constant because our response
data, the PAPS, is a mixture of different units.
Depending on the type of spatial or temporal analysis for hypothesis 1, the explanatory variables are either past human impact (SPDs), palaeoclimatic variables, and/or time (see below).
SPD is the variable of main interest as it represent past human presence. The palaeoclimate is a matrix of summer precipitation, winter precipitation, annual temperatures, and winter temperatures. These are selected as we considered them most relevant to represent major differences in climatic conditions in all the regions (in respect to differences in warm, cold, dry, wet, or regions with high seasonality). Time is represented by the ages of each pollen record, however, this is more difficult to interpret. We assume age may represent time dependent changes such as natural successions and/or ecological changes due to interaction between taxa.
In the hierarchichal variation partitioning analysis, the predictor variables can be applied either as individual predictors or as groups of predictors. In our case, we run the analysis with groups of predictors. This means that the palaeoclimatic variables are included as one matrix and not assessed as individual predictors. (The overall results is not very different from using individual predictors).
The analysis is run in two different ways:
We create a single model per each recods. We then use the adjusted r2 to assess the goodness-of-fit of models. Adjusted r2 is the modified version of r2 that corrects for the number of samples and predictors in the model. It is calculated using the Ezekiel formula adjr2 = 1 - (1- r2)*(n-1)/ (n-m-1), where n = number of samples in the dataset, and m is the number of variables. In vegan, these are so called semipartial r2 (Legendre et al. 2011). Negative adjusted r2 were replaced by 0 in our analysis. The assessment of a satisfactory model fit is contingent upon the underlying data characteristics. Thus, we considered the range of adjusted r2 and removed models with a total fit falling below the 5% lower quantile threshold, specifically those with a total adj r2 below 0.121.
Then we extracted the total variation of that whole model as well as varience explained by individual predictors. There are three types of variation partition per each predictor:
We then use the total model variation to calculate the percentage of shared and unique variation explained by each predictor.
#>
#> ------------------------------------------------------------------
#> total_variance predictor predictor_unique predictor_shared
#> ---------------- ----------- ------------------ ------------------
#> 0.283 human 0.0717 0
#>
#> 0.283 climate 0.0083 0.1008
#>
#> 0.283 time 0.0321 0.0897
#> ------------------------------------------------------------------
#>
#> Table: Table: Result table of a single record for spatial hierarchichal variation partitioning (continued below)
#>
#>
#> -----------------------------------------------------------------------
#> predictor_total predictor_unique_percent predictor_shared_percent
#> ----------------- -------------------------- --------------------------
#> 0.0521 25.34 0
#>
#> 0.1091 2.933 35.62
#>
#> 0.1218 11.34 31.7
#> -----------------------------------------------------------------------
#>
#> Table: Table continues below
#>
#>
#> -------------------------
#> predictor_total_percent
#> -------------------------
#> 18.41
#>
#> 38.55
#>
#> 43.04
#> -------------------------
However, for certain records, the shared variance of predictors is much larger than the total variance of the model. This caused percentages to be very high.
#>
#> ------------------------------------------------------------------
#> total_variance predictor predictor_unique predictor_shared
#> ---------------- ----------- ------------------ ------------------
#> 0.035 human 0 0.0523
#>
#> 0.035 climate 0 0.0943
#>
#> 0.035 time 0.0162 0.1104
#> ------------------------------------------------------------------
#>
#> Table: Table: Result table of a single record for spatial hierarchichal variation partitioning (continued below)
#>
#>
#> -----------------------------------------------------------------------
#> predictor_total predictor_unique_percent predictor_shared_percent
#> ----------------- -------------------------- --------------------------
#> 0 0 149.4
#>
#> 0 0 269.4
#>
#> 0.1266 46.29 315.4
#> -----------------------------------------------------------------------
#>
#> Table: Table continues below
#>
#>
#> -------------------------
#> predictor_total_percent
#> -------------------------
#> 0
#>
#> 0
#>
#> 361.7
#> -------------------------
In these cases, the outliers above 100% were removed.
We then summarised the results per each climate zone within a continent.
Here is an example of the distribution of unique variance explained by humans for Europe (part of Figure 3).
We then estimate the the median value of variance explained by each predictor (unique and shared).
In adddition, we plot the distribution of other parts of variance for each predictor. This is part of side plots for each continents in Figure 2.
We make a new Hierarchical Variance Partition analysis, however, this time we restructure the data, so that one model is run for each 500-year time slice for each continent. (note that we fist tried it for each ecozone on a continent, but the results is relatively similar so we use the continental scale). This means that data from all records within continent for specific time slice are placed in the same model. The predictor groups in model are the past human presence and the matrix of palaeoclimatic variables (time is not included as all sample have same time). It was necessary to filter out time slices, which have less than 5 samples within continent. Additionaly, there were several models, where all the SDP values equal 0 (no human impact). Such models will fail (unable to compute).
Similar to spatial changes, unique and shared variance explained by each predictor is extracted and plotted. In the figure, each bar (cplumn) is a single model. The light grey bars are period, where we do not have data or model fail. The green color represent unique/shared variance explained by human predictor.
Finally, all those parts are combined in Figure 2.
This is the only text about it i can find anywhere. Can @vivian look into thit?
Records with scores constrained by humans when time and climate is partialled out that explains more than 10% of the variance plotted along time. Dashed vertical lines reflect MRT partitions, whereas no MRT split was detected for North America. These indicate potential change around 3-5 ka, whereas the r2 ranges between 0.3-0.61, and all models have a high prediction error (CV Error > 1) indicating very low prediction power
Here is an example for Europe:
#> X-Val rep : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000
#> Minimum tree sizes
#> tabmins
#> 1
#> 1000
Those are then combined with the distribution of variance explained unique by humans in Figure 3.
To test Hypothesis 2, that the interrelationships among pollen assemblage properties in response to humans, we integrated assemblage properties at each 500-year time slice within climate zones across continents. We used a principal component analysis (PCA) with standardisation of all indices.
Here is an example of PCA for Europe - Temperate Without dry season - 2000 yr slice:
To quantify the overall changes in the interrelationship among pollen assemblage properties, we employed symmetrical Procrustes sum-of squares (m2) to compare pairwise PCA changes within each time slice.
Here is a m2 matrix of all time slices to each other for Europe - Temperate Without dry season:
We can then plot the changes between subsequent time slices (diagonal+1 in the matrix) in relationship to the time. Here each point is a m2 value between 2 time slices, the full line represent GAM model and the dahsed line LOESS smoother. Here is an examle for Europe - Temperate Without dry season:
Because m2 matrix is on a level of climate zone within continent, we need to de determine the general patterns of past humans (as represented by SPD) and palaeoclimate within the climate zones in the continents. We used hierarchical Generalised Additive Models (HGAM) to depict their main trend.
We are aiming to recreate a general trend of predictor for our data, not for the climate zone itsef. This means we will use hGAM of individual SPD tredns and do not construct new SPD from RC within the climate zone.
Here is a general trends of SPD (Extended Data Figure 8):
Subsequently, for each continent and climate zone, we used the pairwise m2 as distance matrices in hierarchical variation partitioning to quantify the importance of humans, climate, and time in explaining the multidimensional shifts among pollen assemblage properties through time. Subsequently, the explanatory variables derived from HGAMs are integrated into hierarchical variation partitioning to gauge the extent to which human activity and/or climate shifts influenced variations in the interrelationships among pollen assemblage properties.
The visualisation is the same as H1. Here is an example for Europe:
Finally, all those parts are combined in Figure 4.